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maligned in recent years. They differ from criterion • referenced tests 
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standard or specific performance objective while in a norm-referenced 
test assessment is made in comparison to other students taking the - 
same test. Standardized tests Can be abused in some of the following 
ways: failing to recognize limitations within the testing situation 
that obscure the "true" level of competence; making decisions ot\ the 
assumption that the score derived from the test tells all; making 
decisions on the assumption that reading and intelligence tests . , 
measure exclusive domains; assuming^ that a standardized test can give 
specific direct:^on to an instructional program; interpreting results 
without reference to the .composition of the norm group; making the. 
assumption that anyone Jfe^lbir the 50th percentile is a disabled 
learner; and ^J^^^^^^ ^ grade score on a reading test as a functional 
reading ley>»^^::,^^TOwever , s-^ndardized tests do have many positive 
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f ur^^'Tdi^gnostic needs, and as^'a^ w^y to generate hypotheses 
regarding instructional needs. (MKH) 
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THE STANDARDIZED TEST : USES AND ABUSES 

♦ 

The issue surrounding t^ie use and atuse of standardized tests 
has frequently proven- more an emotional than an inteilectual one. As so 
oft*en happens with emotional issues, standardized test discussions tend 

i <- ' 

to dichotomize people.' This is reflected in debates like "criterion vs 
norm-referenced testing", "to brlin^atch or to educate", "quantitative 
VS' qualitative aspects of education". The ultimate in emotional outbursts 
is exemplified in the fitle of a kit recently developed by the National 
Council of Teachers ofjEnglish, "A First-Aid Kit for the* Test-Wounded" . 

'^A close rival is reference to test^ as "prejudicral educational traps" In 
an article entitled "Shot Down by the Tests" (Skinner, 1968, p. 13). The 
defiant toners of one camp denouncing the possibility that a worthwhile 
educational objective can be quantified lis matched only by the hushed tones 
of the other camp who view the- score derived from a /succession of squiggles 
comprising nothing less ithan a magic number - a magic number that can be 
viewed in absolute terms. Risking the role of a "fence-si titer", I submit 

*that tests, with a few notable exceptions, are neither good nor bad. Like 

atomic energy, -standardized tests can serve useful functions; they can also 

easily be used against a child. This poinC must be emphasized - a test will 

never "wound" anyone nor will it "trap" "anyone; the decision of "wbunding" 

or "trapping" lies vithin the prerogative of the test user. 

* ' ^^^^^ 

In this discussion I propose to summarize very briefly the 
differentiation between norm- and criterion-referenced testing; abuses ot 
norm-referenced (standardized) tests and tTieir uses. The reversal of the key 
words "abuse's" and "uses" in no w5y represents a blas,^ but rather a col^<;em tc 
end the discussion on a positive note. AX 
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DIFFERENTIATION BETWEEN CRITERION- AND NORM- REFERENCE^ TESTING 



Tests are des^gne'd to answer questions that educators raise. • 
"What is. Jason's specific problem in reading?" "How well is my class doing 
in relation t^ Classes elsewhere?" "Did I do aS good a Job teaching this 
year as last year?" "Should I shift emphasis iri my teaching 'program In 
word identification skill-s?"* -Which of the questions are to be answered ' 
will be determined by the choice of type of test- c/iterion or norm-referenced. 

Space does not permit a detailed discourse on criterion-referenced * 
testing^ nor is it necessarily within the interest of the present discussion to 
do so. However, a brief dif f erentiatioh between the two types of testing is • 
pi:esented. ✓ ' ^ 

The major difference between norm- and criterion- referenced tests 
lies in the way the items are developed and selected and the Way in which the 
test is to be interpreted. Criterion-referenced tests imply that a student 
is ^sessed in comparison to an absolute standard rather than in comparison' 
to other students taking the^satne test (Good, Biddle and Brophy, 1975-, p. 155). 
These test-s are developed to yield measurements that can be interpreted' ^ 
directly in terms of specific performance objectives. For exatnple, "The 
learner can identify the main idea of a paragraph 95 percent of- the time". 
Note that there is no implication as to whether this is good or bad. Only 
the teacher can decide this. Note also that the con-elusion based on testing 
the objective clearly implies direct instructional needs. From this standpoint 
the cri-terion-referenced test is usefiil in' aiding * on the spot' decisions 
regarding instruction. 

Since criterion-referenced tests have their base in perforniance of 
objectives, the tests are subject to the same weaknesses and strengths as 

/ ' 
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performance objectives themselves. 'Such testing involve-s breaking down a 
-subject area into small instructional units so that all students can master 
^ a commonly agreed upon set of skills. One of the major obstacles facing 
proponents tff criterion-referenced testing has befen the question of agreement 
on the domain of skills to be included. ' Another problem has been the lack 
of agreement on criterion le'vels. (Is 80 percent or 400 percent mastery 
minimum performance level?) Perhaps the greatest haz^d has been the 
temptation to test those skills which submit readily to statements in 
performance terms. ) 

y • ' 

Norm-referenced testing is not concerned with 80 or 100 percent 
mastery levels; it provides meaning to a student's score only by comparison 
of his test performance with that of others on the same, test rather than 
comparison' against C absolute standard. Whereas, criterion-referenced testing 
denotes high scores, or "bunching" scores at the top level, norm- re fere need ' 
testing spreads students' Scores as far as possible. Tliis is a,f complished 
by posing questions that roughly 50 percent of the students ^ respond to 
correctly and that^ are responded to correctly more often by students who attain 
; high total scores than by those who achieve a relatively low total score. Norm-' 
referencing, by definition, denotes that equal numbers of students in the norm 
sample score above and below grade levels. 

m 

Compared to criterion- referenced tests, norm- re fere need tests are 
typically, although not exclusively, designed to evaluate more global aspects 
of the curriculum and thus have less relevance to immediate instructional 
application, . \ 

In summary, then, what is it that we are attempting to accomplish 
in standardized (norm-re ferenced) testing? Basically, .we are observing a 
sampling of a student's behavior from which we are making an estimate dS^ his! 
"true*' level of competence. This estimate may be used for predictive purp9^s, 
\ 
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a^d to determinywhat cb/anges in curriculum and instruction should be made. 
Whatever the purpose, ^he assumption generally is that testing conditions, 
the test, and &he_Jrffte rpretation of the test are optimum. Blind acceptance 
c.f .scores from standardized tests has often led to varying degrees of abuse 



ABUSES OF STANDARDIZED TESTS 
Failure to recogni ze limitation's within the testing situation that obscure 
the "true" level of competence . 

* 

\ -Any cdndition within tl<^ teat situation that reduces the Reliability 
of a gi^^en test masks or obscures the ^competency level attained by the testee. 
Perhaps the most critical factor affecting the reliability has to do with 
anxietie^^ ge.nerated within the testee. Seller (1970) attributed fear of test 
situatibns as one of the anxiety-produCing 'variables . The aura or mystique 
Bhroudi-ng the .test, he felt, created anxieties which accounted for reduced 
productivity. He reported one survey ^hich revealed that some adult applicants 
for testing thought that taking a test was like a medical examination requiring 
various stages "of undress. In the same survey one applicant interpreted ''test 
battery" as demanding knowledge of electricity. 

Anxieties spring from the- mqs^t yLnsuspecting saurces. A precocious 
kindergarten child overheard the psychologist tell the teacher that he would 
be back in the afternoon to "wind up" the rest of the testing. The parent of 
the child called the school during lunch hour reporting her child * s reluctance 
to go to school because a stranger was going to "wind her up". 

A further factor that affects both reliability and validity of tests 
is the formal in which the item is cast. Comprehension, for example, is 
measured in many ways. If a child is working the items on the Stanford Reading 



Test , he will be given a cloze paragraph with response choices at the bottoiR-^ 
of the paragraph^. If he is doing the comprehension test from the Mon roe- 
Sherman Diagnostic Readl^ Test , he will be confronted with a question first, 
will then read the p^A^graph, and then circle one of several choices to answer the 
question. If he is tested on comprehension with the Durrell Analysis of 
Reading Difficulty he will simply try to recall the^ Jdeas he has nead in a 
paragraph. ' 

> Misinterpretations of test results frequently stem from another 

I 

type of i'tem which requires the testee t<j> circle one ot four or five words 
that best corresponds with a given pictorial stimulus. Frequently, the 
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obscurity of the picture or the experiential background of the child limits 
his ability to identify the picture correctly and, of course, ultimately his 
ability ^o circle the correct woi^d. 

. Frequently tests are .interpreted without reference to the response 
level required by the testee. For exampWi there is a substantial difference 
betweeft mere recognition and identification. If the testee is required to 
recognize the yord ''funny" in the series, "fair", "funny", "flew", "folly" 
his response level is different from the requirement that he identify the 
word "funny" without aid. 

Closely linked with format of the test item is the Inclusion of value 
or direct experience-based items. Schiller (1974) refers" to an item on the 
Wise which states, "If your mother sends you to the store for a loaf of b^ead 
and there is'none, what do you do?" The child who answers, "I go back home", I 
is consj-dered to be intellectually inferior to the child who says, "I go to 
another store". The point is that many children in rural areas have only one 
store to go to. It is also conceivable that a child in- a 'city gets instructions ^ 
to go to a specific store and feels that to go to another could be interpreted 
•as disobedience. Dreskin (1965) gives an. example of how choice of vocabulary 
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on Intelligence tests often favors children from "better class homes". When 
children from "b. t tc^r-class" homes and "lower-class" homes were confronted' • 
with the analogy. "Symphony is to'composer as book, is to (paper, sculptor, 
author, musician, man)." the first gro'i sc6Ted correctly 81 percent of the- 
time against 52 percent for the second group. When the analogy was recorded 
to ••Baker goes with bread as carpenter goes with (saw. house, spoon, nail, 
taan)." the two groups scored evenly ^n checking "house" . If the objective of 
the. item is to ascertain the learner', facility to deal with analogy, the word 
••8ymph|,.y" certainly appears to have set up a barrier to differentiate those 
learners who can handle analogy from those who can not. 

. * A further source of misinterpretation can arise from the fact that 

some may not be familiar with foriaal test-taking-behavior. Ruddell (1974) 
attributes low achievement scores of sopie children to: 

pupil unfamlliarity with labels and concepts used 
in test situations, i.e.. failure to understand 
the task required to respond in test items; and 
unfamlliarity with labels apd concepts being 
evaluated by • inst rument '(P- 384). 

A final source of misinterpretation may arise from failure to 

recognize that certain items on -comprehehsion tests can be answered without 

dependence on the written passage. Tuinman (1973) found that probabilities 

of correct responses on test passages not read by students in grades four to 

six were Well above the expected chance level. Average probabilities of ' 

correct responses with no passage present ranged between .32 and .50. 

In summary, failure to recognize li-mi'tations within the testing 

situation can- well obscure, the testee's "true" level of competence. The 

varying formats, content and test condit ions can only too easily lead to the 

situation described by Dreskin (1965)-. He reports the IQ scores of a girl 

whose father was in the armed forces and had been stationed across Canada. 

The parents were understandably confuq(^d by the facfthat their daughter's • 
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j aterlligence ranged all the f rom lowvaverage to superior depending on 

where t^he had been tested : 110' in British Columbia, 90 in Manitoba, 115 in 
Ontario an4 125 in New Brunswick. 

Making decisions on the assumption that the ^core derived from the test 
tells all> , . . * 

A test score can be interpreted only in the light of the degrfee to ^ 
which the items sample the domain of the construct represented. For exam|3^1ei 
a silent reading test does not tell us nearly every thing about the testee's 
reading.' Reading ia a highly complex cCgnitiver and affective process. ' What 
we are, getting from the student's silent reading is a small sampling of the 
product of his reading - very little about the process . Again, item format 
has some bearing on this. It would appear that analysis of a test cast in 
cloze format yields more information on process than questions answered 
subsequent to paragraph reading. The anomalous nature of part scores on 
reading tes'ts is nowhere more evident than it' is in the case of reading 
comprehension tests (Traxler, 1970. p. 223). Many comprehension tests consist 
largely of factual questions yhile others emphasize aspects of critical, 
inferential and creative reading. Meaning can be attributed to the learner's 
score only in the light of a close examination of the test domain. 

Even rate of reading is not 'the simple procedure it appears to be 
superficially (Traxler, 1970. p. 222). We have to be conceirned with rates 
rather than one rate. Content is an important determining variable of rate 
from the standpoint both of concept familiarity and^load and personal interest 
or motivation. Further, interval of time is an important factor in determining 
reading rate. »Traxler recommends a sampling of at least three to five minutes. 



Making decisions on the assumption that readlng^ and intelligence tests 
measure exclusive domains. . • 

I.t is not uncommon to hear school personnel reflect) on the cumulative 

report of a child in the following manner: 

1 don't understand; Charles had an IQ of 110 when 
he was tested io grade two, 101 in grade four, and 
now two years* later he is down to 91, almost a 
candidate for a special class. No ponder he has 
trouble reading. 

There are at least two ^elated problems. First, if the child has 
been given a group test which is likely, his inability to read is going to i 
reflect cumulatively in the Intelligence tests. This does not take into 
account additional problems of failure complexes and increasing lack of 
motivation. ' ^ 

Further, intelligence tests sample content closely akin to reading 
comprehension tests. After all, reading ^is thinking. Traxler feels that the 
better and more searching the reading test is, the greater this limitation • 
becojnes (p. 224). So, scores on reading tests really represent- a composite , 
of reading and intelligence. ^ ^ 

Assuming that a standardized test can give specific dlreotion to an instructional 
program. ^ . , ^ ' 

Because of the highly generalized nature of most standardised 
achievement tests, they do not measure the specific objectives for a particular 
student (Ruddell, 1974. p. 384). At best this very global assessment can give 
very general directions for Itis t ruct ion . Two possible exceptions come to mind. 
First, if, say, a reading comprehension test sainples a wide spectrum of. 
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c^^mprehension tasks ranging all the way from factual recall to inferential 
ri^ading, a careful item analysis will aid in revealing specific instructional . 
needs. Second, if the test is designed to yield a diagnostic profile, specific 
instruc t^iopal trends can be revealed. Generally, however, group te^ts are 
designed to measure the achievement ^ groups rather than the educational 
placement of individuals. 



Interpreting results tii^out re^ereno^ to the composition of the norm^^ou 



It has been mentioned earlier that an individual's Score on a . 

standardized test is interpreted in comparison with the performance of the 

norm group. . Ruddell (1974) states ex^icitly that: » 

Because^'the "objective" scores students receive 
on a standardized reading achievement test are 
cletermined by the norm gr^up to whom. they are 
compared, fhese tests tell little about student 
achievement unless this norm group is completely 
and accurately defined. Boards of education, 
the community, parents, and even professional 
educators often misinterpret achievement test 
scores for this very reason (p . 385) . 

Making the assumption that anyone below the 5Qth percentile is a disabled 
learner. v 



As mentioned earlier, the very nature of a norm^referenced test means 

that scores will be spread out or to put it another way, Chat the test will 

differentiate between weak and strong students. If we administer a reading 

achievement test to a group of students similar in composition to that of the 

norm group, we can expect approximately half of the students to fall below the 

* «« 

50th percentile. 

y 
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G iving special instruction to enhance performance on the test. 

Pressure generated through lack -e-f^understanding has frequently 
resulted in specific instruction to raise scores on achievement tests. This 
form of corruption negates any value to be gained from the test results. 
Again, this kind of pressure can result from a misinterpretation of the basic 
notion of norm-ref e renced tests. If, for example, superintendent X finds 
that School A's achievement is considerably beyond that of School B, and 
communicates his concern, pressure may be felt to note and select for special 
emphasis particular areas from achievement tests. There may be good reasons' 
why the achievement of one school is different from the other - socioeconomic 
level, teacher turn over' to mention only two. There is no suggestion here to 
discourage close exanrination of test results on a. cbmparat jfve basis to raise 
hypotheses about curriculum and instructional practices. 

Treating a grade score on a reading test as a functional reading level.' ^ 

Standardized tests suffer wide abuse from over-interpretation. 
Assuming that a grade score of 5.0 on a group test indicates either' an 
independent or instructional reading level is too typical, ^gain, the global 
nature, item selection and group nature of most standardized reading tests 
denies such interpretation. Functional levels are^best ascertained by allowing 
the reader "try-outs'* with actual content material or by admi,nis te ring an 
infonnal reading inventory. 
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Using test results to separate students for status purposes ,(^ssig"P°6"t: of 
awards, ^*grade§, etc, ) . ^ ^ ^ | 

Little needs to be said about the folly of using standardized test 
results for assignment of grades or jpromotion purposes. Again, the nature 

of content selection and the fac't that most tests are designed to assess 

.J 

achievement of groups places the validity for such purposes into serious 
question. " ^ - 

Using test' ^^ults for permanent grouping or streaming . 

The widespread use of standardized tests for the purpose of grouping 
continues inspite of clear evidence of the invalidity of such practice. 
Perhaps, "^h^ most convincing evidence that reading achievement test scores • 
do not differentiate specifically enough to ensure homogeneity is that 
provided by Balow (1962). H.e\<^und in his investigation that when four 
classes of fifth graders were ''streaming'* on the basis of reading test 
scores, the groups still vere essentially heterogeneous. When specific 
subskills were evaluated there was considerable overlap between ^the 
highest and lowest streams. What, in fact, happens is^ that the global 
comprehension score masks specific individual instructional needs. It is 
not within the interest of this paper to debate the pros of homogeneous 
versus heterogeneous assignment o!B< groups. There is, however, sufficient 
evidence to support heterogeneous ^ass ignment even if' we could determine 
methods of '^homogenizing" groups. There is no implication here that short- 
term groups should not be formed for specific skill instruction. The point 
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is that a survey achievement test cdnnot adequately accomplish such a 
differentiation. ^ » 

After such an extended discourse on the abu^*^ of standardized test- 
ing, it seems increc^ible that any v^lue could be attributed to the use of 
tests. This is not the case, ^sed for their iVitended purposes, con- 
siderable value can be^prived from their use. 



USB OF STANDARDIZED TE3TS 
Using the testVas an accountability check. 



Traxler (1970) considers the most' important 'value of a reading tes^', 
or any other standardized test, to be the definiteness that 'it lends td* 
our thinking about a pupil or group (p. 226). To use a te^t as an ex- 
ternal monitor adds a degree of objectivity lacking in class or even 
schoolwide tests. One a class, school, or system-wide basic, a 
standardized program can serve to provide a basis for evaluation of 
global aspects of the program. This applies both to ascertaining 
changes in achievement over a number of years as well as to determining 
effects of a program on a shorter term basis. The^ important caution is 
that the basis is jiot as solid and dependable as the '*bald, bold figures" 
suggest (Traxler, p, 226) because of the limitations reviewed earlier. 

Using the test as a screening device to determine further diagnostic needs. 

* A systematic approach to identification of learners who need special 
instruction is requisite to, an efficient program. Ideally, this identi- 
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fication begius with, groas measures applied on a whole-class basis. 
At this, level the standardized survey test is the reasonable choice^ 
Subsequent measurements become succesively more specific and precise for, 
say, a small group of. children who have baen screened^ by means bf a * 
survey test to indicate problems. Harris (1970>- presents a pyramid model.^' 

of successive- levels of screening. The model is illustratec^ here: 

i ■ 



General 
Assessment 

Analytical 
^ ' Assessment 
^^ 

Case Study 




Gross ^ 
Measurement 



Fine"^ 

Measurement 



Successive Levels of Screening (Harris, 1970, p.\ 95) 



At the analytical assessment level^ .it is likely that a group diagnostic 
test would be administered. This would be supplemented with various 
informal diagnostic, techniques. At the case study level, indivi^^l 
tests and observational procedures would be employed. 

Using the test to generate hypotheses regarding instructionat needs. ^ 



An atmosphere within a school that stimulates use of tests to 

r 

generate hypotheses about instructional needs is the next best thing to 
an in-school research program. These hypotheses may relate to "across 
curriculum" or "within curriculum" concerns. 

Across the curriculum; for example, the concern may be to collect 
data on relative achievement strengths and weaknesses in reading, listen 
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ing and mathematics. Suppose a group or an individual within the class 
exhibits the following prof ile: . - ' , 



5.5 
5,0 
4,5 
4.0 
3.5 
3.0 
2.5 
2.0 
1.5 
1.0 



Grade 
Scores 




Math 
Concepts 



Computat ion 



Listening Reading 
Comprehension ' Comprehension 



One thing becomes clear immediately. The learner certainly has the 
capability to improve in reading Judging from his performance in mathe- 



maticS and listening comprehension. ^The questions which arise might be 

the following: "If X is able to comprehend so well at the listening 

r^vel, obviously able to process information, is it word identification 

skills which account for his problem in reading?" "If so, which specific 

» 

skills are lacking?" 

It is not uncommon to use an across-curriculum examination as a 
basis for determining reading expectancy levels^ Otto^ and M^cMenemy (1973) 
sjjWests use of both mental age (based on group intelligence tests) and 
mathematics age as a basis for comparison with reading age. The follow- 



-r4- 



ing Excerpt shows how the cjLassroom profile is set up atid the information 



gained from it; 



TABLE I: 



Comparisons of Chronological, Me'ntal^ Reading, 
and Arithmetic- Ages of Sevent-h-Grade Pupils 



Pupil 


Chrono- ^ - 
. logica). 
^ Agfe (CA)* 


Mental 
Age (Ma)* 


Reading > 
Age (RA)* . 


Arithmetit 
Age (AA)* 


Difference 
Betweet\ 
i4A and AA 


Difference 
Between 
MA and" RA 


1 


r 

12.0 


14. 3 


,15.0 


13. 2 


-1.1 


-K)!9' 


2 


12.5 


... .16.1- 


15.0 


14.8. / 


-1.5 




3 


12.4 


15.2 


|5.0 


14.1 


-1.1 '. 


» 40^2 


4 


* » 11.9 , 


< 

16.7 


' 14/6 


14.5 


-1.1, 


' -2. I 


5 


12. 1 


11.4 


12.4 


12.1 


+0.9 ' 


+1.0 


16 


11.8 . 


12 -. 0 ■ 


> ■ 9.1 


11*2 


-0. 10 


-2.11 



' * All a^e^/are iij years and months. 
.''•iHf- Taken '^fl^ Smith and McMenemy (1973, p. 106). 

Natura>ly small differences between M.A, and R.A. or A. A. and R.A. - 
have to be ignored as chance level differences resulting from measurement 
error. On the other hand, differences in the magnitude of plus or minus 
/ 1.0 ^t the elementary level should be cause for careful further analysis. 
Perhaps even more important is the ^se of tests to examine . problems 
within a' curriculum area. This is a particularly fruitful area for 
stimulating instructional changes. To illustrate how such changes can 
come abfim.t, the writer was engaged on a consultative basis in a northern 
reserve school. Two grade two classes h^d been randomly assigned to their 
classes. In May an achievement test was administered to both classes. Th^ 
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test consisted of a Word Recognition Test (words in isolation), Compre- . 
bending Significalit Ideas, and Comprehending Specif ic Instruct ions- 
profiles were constructed for each individual* cfiild. An analysis revealed 
that in class A, 17 out of 24 children were as high or higher in word 
recognition as ia either of the c6mprehenai6n tests. In cliass B the reverse 
•was observed - 19 out of 26 pupils were at least as high, in comprehension as 
in <terd recagnition. An examination of these r^ults- led to serious dis- 
cussion about the instructional program carried- oat in classroom A. . The 
changes in instruction the following year were most apparent. The results 
at the end of the year confirmed the impact, of - the instruction. 

A standardized reading su.rvey t>est cdn reveal considerable information 
if it includes a wide range of comprehensioa questions. If^these questions 
are then clustered (e.g., #'s 3, 7, 11, 15, etc. are main idea; ^ 2, 4, 8, 

■ ► 

14, 19, etc. ate Implied meanings and so on), the teacher can determine 
" individual instructional level but can also get .an indication of areas 
where her instruction tends to . leave gaps. 

Zehm (1975) i^eports findings of ^research which re.sulted from the 
• discovery that second grade studenrt. in San Francisco schools dropped well 
beloV? the national norms in reading. The investigation isolated four schools 
where the reverse was true ^ reading scores were above the national average - 
to determine the sources of (.success. Class size was not the variable; nor 
was- the number of minority students. In fact, the researchers found. 40 to 
100 percent minority students in. these classes. , Further' investigation 

revealed that neither technique, capital outlay or method was the key. Methods 

■ . A 
in fact, varied from highly strOctured approaches to the more flexible style 

of the open classroom (p. 25). The key to success was found in the attitude of 

thei^ teachers. They were enthusiastic, positive, optimistic about their 

■> 

students' potential, and emphasized reading in every 'subject. 



There is no need' to suminarize in an attempt to debate whether abuses 
of tests outweigh their uses. This should, in fact, nejver become an 
issue. We know the value that tests can have; we only need to avoid the 
widespread abuse of these inst,rument s. 



in 
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